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Abstract 

Finite mixture of skew distributions have emerged as an effective tool in modelling 
heterogeneous data with asymmetric features. With various proposals appearing rapidly 
in the recent years, which are similar but not identical, the connections between them and 
their relative performance becomes rather unclear. This paper aims to provide a concise 
overview of these developments by presenting a systematic classification of the existing 
skew distributions into four types, thereby clarifying their close relationships. This also 
aids in understanding the link between some of the proposed expectation-maximization 
(EM) based algorithms for the computation of the maximum likelihood (ML) estimates 
of the parameters of the models. The final part of this paper presents a comparison 
of the performance of these skew mixture models in clustering real datasets, relative to 
other non-elliptically contoured clustering methods and associated algorithms for their 
implementation. 

1 Introduction 

In recent years, non-Gaussian distributions have received substantial interest in the statistics 
literature. The growing need for more flexible tools to analyze datasets that exhibit non-normal 
features, including asymmetry, multimodality, and heavy tails, have led to intense development 
in non-normal model-based methods. In particular, finite mixtures of skewed distributions have 
emerged as a promising alternative to the traditional Gaussian mixture modelling. They have 
been successfully applied to numerous datasets from a wide range fields, including the medical 
sciences, bioinformatics, environmetrics, engineering, economics, and financial sciences. 

The rich literature and active discussion of skewed distributions was initiated by the pioneer- 
ing work of Azzalini (1985), in which the univariate skew normal distribution was introduced. 
Following its generalization to the multivariate skew normal distribution in Azzalini and Dalla 
Valle (1996), the number of contributions have grown rapidly. The concept of introducing ad- 
ditional parameters to regulate skewness in a distribution was subsequently extended to other 
parametric families, yielding the skew elliptical family; for a comprehensive survey of skew dis- 
tributions, see, for example, the articles by Azzalini (2005), Arellano- Valle and Azzalini (2006), 
Arellano- Valle et al. (2006), and also the book edited by Genton (2004). 

Besides the skew normal distribution, which plays a central role in these developments, 
the skew t-distribution has also received much attention. Being a natural extension of the t- 
distribution, the skew t-distribution retains reasonable tractability and is more robust against 
outliers than the skew normal distribution. Finite mixtures of skew normal and skew t- 
distributions have been studied by several authors, including Lin et al. (2007a,b), Pyne et al. 
(2009), Basso et al. (2010), Fruhwirth-Schnatter and Pyne (2010), Lin (2010), Cabral et al. 
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(2012), Vrbik and McNicholas (2012), and Lee and McLachlan (2012), among others. With the 
existence of so many proposals, with their various characterizations of skew normal and skew 
t-distributions, it becomes rather unclear how these proposals are related to each other, and to 
what extent can the subtle differences between them have in practical applications. 

This paper provides a concise overview of various recent developments of mixtures of skew 
normal and skew t-distributions, and demonstrates the performance of these distributions in 
clustering real datasets in comparison with other skew mixture models. We first present a 
systematic classification of multivariate skew normal and skew t-distributions, with special 
references to those used in various existing proposals of finite mixture models. We then illustrate 
the relative performance of these models and other related algorithms by applications to some 
real datasets. 

Recently, Lee and McLachlan (2011) referred to the skew distribution of Pyne et al. (2009) 
as the 'restricted' form of skew distribution, and the class of skew elliptical distributions of Sahu 
et al. (2003) as having the 'unrestricted' form. While this terminology was later briefly discussed 
in Lee and McLachlan (2012) when outlining the equivalence between the skew distributions of 
Azzalini and Dalla Valle (1996), Pyne et al. (2009), and Basso et al. (2010), further details were 
not given. This papers aims to fill this gap. We shall adopt the above terminology, and expand 
this idea further to classify more general forms of skew distributions, namely, the 'extended' 
and 'generalized' forms. 

The remainder of this paper is organized as follows. In Section 2, we present the classifica- 
tion scheme for multivariate skew normal and skew t-distributions, clarifying the connections 
between various variants. Next, we discuss the development of currently available algorithms 
for fitting mixtures of multivariate skew normal and skew t-distributions in Section 3, point- 
ing out the equivalence between some of these algorithms. Section 4 presents an application 
to automated flow cytometric analysis, and comparisons are made with the results of other 
model-based clustering methods. Finally, some concluding remarks are given in Section 5. 

2 Classification of multivariate skew normal and skew 
t-distributions 

2.1 Multivariate skew normal distributions 

Since the seminal article by Azzalini and Dalla Valle (1996) on the multivariate skew normal 
(MSN) distribution, numerous 'extensions' of the so-called skew normal distribution have ap- 
peared in rapid succession. The number of contributions are now so many that it is beyond the 
scope of this paper to include them all here. However, most of these developments can be con- 
sidered as special cases of the fundamental skew normal (FUSN) distribution (Arellano- Valle 
and Genton, 2005), and can be systematically classified into four types, namely, the restricted, 
unrestricted, extended, and generalized forms. 

We begin by briefly discussing the FUSN distribution, since it encompasses the first three 
forms of MSN distributions. The FUSN distribution itself is a generalized form of the MSN 
distribution. It can be generated by conditioning a multivariate normal variable on another (uni- 
variate or multivariate) random variable. Suppose Y\ ~ N p (0,T,) and Y is a g-dimensional 
random vector. Adopting the notation as used in Azzalini and Dalla Valle (1996), we let 
Yi | Y > be the vector Yi if all elements of Y is positive and — Y"i otherwise. Then 
Y — ix + (Y\ | Yq + t > 0) has a FUSN distribution. It is important to note that l^o is 
not necessarily normally distributed, but in the restricted, unrestricted, and extended cases, it 
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Case 


Notation 


Definition 


Examples 


restricted 


rMSN 


q = 1, r = 0, and 


Y 1 




A-MSN, B-MSN, SNI-SN, P-MSN 


unrestricted 


uMSN 


q = p, t = 0, and 


Y Q 
Y 1 




S-MSN, G-MSN 


extended 


eMSN 


r ^ 0, and ^° ~ N q+p 


ESN, CSN, HSN, SUN 


generalized 


gMSN 


Y\ is normally distributed 


FUSN, GSN, FSN, SMSN 



Table 1: Classification of multivariate skew normal distributions. 



is restricted to be a random normal variate. The parameter r G R 9 , known as the extension 
parameter, can be viewed as a location shift for the latent variable Y . When the joint distri- 
bution of Y 1 and Y is multivariate normal, the FUSN reduces to a location-scale variant of 
the canonical FUSN (CFUSN) distribution, given by 

Y = (Y 1 | Y > 0), (1) 

where 



" Y ' 


~ ( 


T 




r 


A T " 


. Y l 




1 


A 


£ 



(2) 



where r is a g-dimensional vector, /it is p- dimensional vector, r is a q x g scale matrix, A is an 
arbitrary p x q matrix, and £ is a g x g scale matrix. 

The restricted case corresponds to a highly specialized form of (2), where Y is restricted 
to be univariate (that is, q — 1), r = 0, and T = 1. In the unrestricted case, both Yq 
and Yi has a multivariate normal distribution. The extended form has no restriction on the 
dimensions of Y , but r can be a non-zero vector. When Yq is not normally distributed, the 
density of Y has the generalized form. A summary of some of the existing multivariate skew 
normal distributions is given in Table 1 and 2, where rMSN , uMSN, eMSN, and gMSN refer to 
the restricted, unrestricted, extended, and generalized version, respectively, of the multivariate 
skew normal distribution. The list is not exhaustive, and the names appearing in the final 
columns are representative examples only. 



2.1.1 Restricted multivariate skew normal distributions 

The restricted case is one of the simplest multivariate forms of the FUSN distribution. The 
latent variable Yq is assumed to be a univariate random normal variable, and its correlation with 
Yi is controlled by S* G 1R P . There exists two parallel forms of stochastic representation for 
a MSN random variable, obtained via the conditioning and convolution mechanism (Azzalini, 
2005). In general, the conditioning-type stochastic representation of a restricted MSN (rMSN) 
distribution is given by: 

Y = ti + {Yi |F >0), (3) 



where 



' Y ' 




" " 




1 


8* T ' 









6* 


s* 



(4) 



Alternatively, the rMSN distribution can be generated via the convolution approach, which 
leads to a convolution-type stochastic representation, given by: 



Y 



(5) 
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Abbreviation 


Name 


References 


rMSN 


A-MSN 


Azzalini's MSN 


Azzalini and Dalla Valle (1996) 


B-MSN 


Branco's MSN 


Branco and Dey (2001) 


SNI-SN 


skew normal/independent MSN 


Lachos et al. (2010) 


P-MSN 


Pyne's MSN 


Pyne et al. (2009) 


uMSN 


S-MSN 


Sahu's MSN 


Sahu et al. (2003) 


G-MSN 


Gupta's MSN 


Gupta et al. (2004) 


eMSN 


ESN 


Extended MSN 


Azzalini and Dalla Valle (1996) 


CSN 


Closed MSN 


Gonzalez-Faras et al. (2004) 


HSN 


Hierarchical MSN 


Liseo and Loperfido (2003) 


SUN 


Unified MSN 


Arellano- Valle and Azzalini (2006) 


gMSN 


FUSN 


Fundamental MSN 


Arellano- Valle and Genton (2005) 


GSN 


Generalized MSN 


Genton and Loperfido (2005) 


FSN 


Flexible MSN 


Arellano- Valle and Genton (2010) 


SMSN 


Shape mixture of MSN 


Arellano- Valle et al. (2008) 



Table 2: Summary of the abbreviations of skew normal distributions used in Table 1. 



where Y ~ ^(0,1) and Yi ~ N p (0, E) are independent. Note that the parameters in (5) 
are not identical to that in (3) and (4). The connection between the pairs (6, S) and (<5, £), 
are discussed in more detail in Azzalini and Capitanio (1999). The skew normal distribution 
proposed by Azzalini and Dalla Valle (1996), Branco and Dey (2001), Lachos et al. (2010), and 
Pyne et al. (2009) are essentially identical after reparameterization, and can be formulated as 
the rMSN distribution. 

The first multivariate skew normal distribution (A-MSN) 

The first formal definition of the univariate skew normal distribution dates back to Azzalini 
(1985). However, its extension to the multivariate case did not appear until just over a decade 
later. The widely accepted 'original' multivariate skew normal distribution was introduced by 
Azzalini and Dalla Valle (1996). The density of this distribution, denoted by A-MSN(/x, S, 6) 
(with some changes of notation) takes the form 

f(y; fi, E, 6 A ) = 20 p (y; fi, ^(S^D- 1 RT 1 (y - A*) I 0, 1 - ^iT 1 ^), (6) 

where S is the covariance matrix, R = D l HD 1 is the correlation matrix, 
D = diag(v / Sii, • • • , V^ PP ) is a diagonal matrix formed by extracting the main diagonal ele- 
ments of S, and denotes the ijth entry of S. We let P (.; /x, S) be the p-dimensional normal 
distribution with mean fi and covariance matrix S, and <3>i(.; /z, <r 2 ) is the (univariate) normal 
distribution function of a normal variable with mean fj, and variance a 2 . Here, the parameter 
a = i = G 3? p regulates the skewness of the distribution. To avoid ambiguity in no- 

-\/l-<5 A £ 5 a 

tations, we have appended a subscript to some of the parameters of the rMSN distributions 
throughout this paper. The density (6) was obtained via the conditioning method (3), with 
Y = fx + D(Y\ | Yq > 0), where Yq and Y\ are distributed according to (4). It corresponds to 
the rMSN distribution in (4) with 6* replaced by D8a- This characterization of the MSN dis- 
tribution was adopted in the work of Fruhwirth-Schnatter and Pyne (2010) when formulating 
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finite mixtures of skew normal distributions, and parameter estimation was carried out using a 
Bayesian approach. 



The skew normal distribution of Branco and Dey (B-MSN) 

Branco and Dey (2001) generalized the original skew normal distribution to the class of (re- 
stricted) skew elliptical distributions. In this parameterization, the term D used in the A-MSN 
distribution was removed, resulting in an algebraically simpler form. However, under this vari- 
ant parameterization, a change in scale will affect the skewness parameter. The reader is to 
referred to Arellano- Valle and Azzalini (2006) for a discussion on the effects of adopting this 
parameterization. The skew normal member of this family, denoted by B-MSN, has density 



f(y; /x, 6, E) = 20 p (y; /x, S)$ 1 ( < 5 T S" 1 (y - »); 0, 1 - ^E" 1 *). 

It follows that the conditioning-type stochastic representation for Y 
Y = n + (Yi | Y > 0), where 



Y 



N- 



i+P 








1 d T 

5 E 



(7) 

is given by 



(8) 



and the corresponding convolution-type representation is 



Y = v + S 



Yn 



(9) 



where Y ~ iV(0, 1) and Y 1 ~ N p (0, E — S8 T ) are independent. It can be observed that (7) is 
a reparameterization of the A-MSN distribution. Replacing 6 in (7) with D6a recovers (6). 

The skew normal/independent skew normal distribution (SNI-SN) 

The skew normal Independent (SNI) distributions are, in essence, scale mixtures of the skew 
normal distribution. Introduced by Branco and Dey (2001), and considered further in La- 
chos et al. (2010), the family includes the multivariate skew normal distribution as the basic 
degenerate case, the density of which is given by 



f(y; /x, 6 S , E) = 20 p (y; /x, E)^^ E^y - /x); 0, 1 



(10) 



where is the square root matrix of E; that is, E2E2 = E. We shall adopt the notation 
Y ~ SNI-SN P (/Lt, E, 6s) when Y has density (10). As with all restricted MSN distributions, 
the SNI-SN distribution also enjoys two parallel stochastic representations. This density is very 
similar to (6) and (7), and apparently, is a reparameterization of them. The connection between 
them can be easily observed by directly comparing their stochastic representations. The two 
stochastic representations of the SNI-SN are given by 



Y = » + (Y 1 \Y >0), 



(11) 



and 



fi + ^8s\Y \ + (I p -6s6 s ^Y u 



(12) 



where 



Y 
Y l 








E5<5c 



(13) 
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and Y ~ JV(0, 1), Y"i ~ N p (0, S) are independent. It becomes apparent that (10) becomes 
identical to (7) by replacing the 6 in (10) with Hi^8s- Cabral et al. (2012) described maximum 
likelihood (ML) estimation for the SNI-SN distribution via the expectation-maximization (EM) 
algorithm, and an extension to the mixture model was also studied. 

The skew normal distribution of Pyne et al. (P-MSN) 

In a study of automated flow cytometry analysis, Pyne et al. (2009) proposed yet another 
parametrization of the restricted skew normal distribution. This variant, hereafter referred to as 
the rMSN distribution (as used in Lee and McLachlan (2012)), was obtained as a 'simplification' 
of the unrestricted skew normal distribution described in Sahu et al. (2003) (see Section 2.1.2). 
Its density is given by 

/ (y; n, £*, 5) = 20 p (y; fx, SI) ^ (S T Q- 1 (y - y) ; 0, 1 - d T S}- 1 d) (14) 

£* + 8S T . It follows that the conditioning-type stochastic representation of (14) is 

Y = n + (Y 1 \Y >0), (15) 



where SI 
given by 

where 



Y 



N- 



i+P 








i s T 

S SI 



and the corresponding convolution-type representation is given by 

Y = ijl + S\Yo\+Y 1 , 



(16) 



(17) 



where again Y ~ Ni(0, 1) and Yi ~ N p (0, £*) are independent. It can be observed that (14) 
is equivalent to (7) by replacing £ in (7) with SI. One advantage of this parameterization 
is that the convolution-type representation is in a relatively simple form, and leads to a nice 
hierarchical form which facilitates easy implementation of the EM algorithm for ML parameter 
estimation. 

For ease of reference, we include a summary of the density and stochastic representation of 
the above-mentioned restricted MSN distributions in Table 3 and 4, respectively. 



2.1.2 Unrestricted multivariate skew normal distributions 



The unrestricted case is very similar to the restricted case, except that the scalar latent variable 
is replaced by a p- dimensional normal random vector Yq. Accordingly, the constraint Yq > 
becomes a set of p constraints Y > 0, which implies each element of Y is positive. Similar 
to (3) and (4), the unrestricted MSN (uMSN) distribution can be described by 



Y = n + (Y 1 |V >0), 



where 



Y 1 



N, 



2p 








J.p — » 

A* S* 



The convolution-type representation is analogous to (5), and is given by 

Y = ii + A\Y \ + tY 1 . 



(18) 
(19) 

(20) 
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Distribution 


Density 


A-MSN 
(1996) 


/ (y) = 20 p (y; /i, E) $ x (d^R^D -1 (y - /x) ; 0, 1 - ^iT 1 ^) 
D = diag( v / S^, • • • , v 7 ^)' R = D 1 ZD 1 


B-MSN 
(2001) 


f(y) = 2<P P (y; /x, E^^E"^ - Ji); 0, 1 - ^E" 1 *) 


P-MSN 
(2009) 


f( y ) = 2Mv; /i, (* T n _1 (y - /*) ; o, 1 - ^ir 1 *) 

fi = E* + <5 T <5 


SNI-SN 
(2010) 


/(y) = 20 p (y; fx, S)$x (<5£5T * (y - M ) ; 0, 1 - <^<5 5 ) 



Table 3: Summary of the densities of selected restricted forms of multivariate skew normal 
distributions. 



Distribution 


Conditioning-type representation 


Convolution-type representation 


A-MSN 
(1996) 




Y 1 


Y = /x + £ 






1 * 

1 


o>0) 
1 ^ 
<5a # 


i) 


Y = n + D6 A \Y \ + Y 1 
Y ~ iVi(0,l) 
Y"! ~ 7V P (0,S-D^D) 


B-MSN 
(2001) 




' Y ' 
_ Y l 


Y = »+( 

~ Ni +P ( 


^1 1 

" 



Yo 

i 


>0) 

" 1 6 T ' 
6 E 


) 


F = fX + 5\Yo\ + Y 1 
Y ~ iVi(0,l) 
Y"i ~ JV p (0, E - <5<5 T ) 


P-MSN 
(2009) 




' Yo 
. Y 1 




I'll 

" 



Yo 

1 


>0) 

" 1 S T ~ 

6 n 


) 


f = / x + < 5|r | + r 1 

r ~ iVi(o,i) 

Y 1 ~ iV P (0,E*) 


SNI-SN 
(2010) 


Y = fi+( 
\ Y °] N (\ 


Vi 

5 


V >0) 

1 8 T S ^ \ 

^s s s y 


F = A* + sUslFol + (/ P - « s ^)^i 

F ~ iV(0,l) 
Y x ~ iV p (0,S) 



Table 4: Summary of stochastic representations of selected restricted forms of multivariate skew 
normal distributions. 
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The random vectors Yr 



N p (0,I p ) 



and Yi ~ N p (0, S) are independent. It should be noted 
that the unrestricted MSN distribution is different to the restricted MSN distribution, and the 
two are equivalent only in the univariate case. The skew normal version of Sahu et al. (2003) 
is an unrestricted form of the MSN distribution, with A restricted to be a diagonal matrix. 

The skew normal distribution of Sahu et al. (S-MSN) 

In Sahu et al. (2003), skewness is introduced to a class of elliptically symmetric distribution by 
conditioning on a multivariate variable, which produces a class of (unrestricted) skew elliptical 
distribution. The multivariate skew normal distribution proposed by Sahu et al. (2003), which 
is a member of this family, is given by 



/ (y; S, 6) = 2% (y; /x, SI) % (ACT 1 (y - M ) ; 0, A) , 



(21) 



where A = diag(<5), fl = £ + A 2 , and A = I p — Af2 _1 A. Observe that with this char- 
acterization of the MSN distribution, the density involves the multivariate normal distribu- 
tion function, whereas the restricted forms is defined in terms of the univariate distribu- 
tion instead. Accordingly, the conditioning-type stochastic representation of (21) is given by 

Y = n + (Yi I Yo > 0), where 



Y 
Y 1 



N, 



2p 








I P A 
A O 



(22) 



(23) 



and the convolution-type representation is given by 

Y = A\Y \+Y 1 , 

where Y and Yi are independent variables distributed as Y ~ N p (0, I p ) and Yi ~ N p (0, £), 
respectively. ML estimation for the uMSN distribution, and its mixture case, is studied in Lin 
(2009). 



2.1.3 Extended multivariate skew normal distributions 

Consider the extended skew normal (ESN) distribution, which originates from a selective sam- 
pling problem, where the variable of interest is affected by a latent variable that is truncated to 
an arbitrary threshold. It can be obtained via conditioning by setting Y = fj,+(Y 1 | Y +t > 0), 
where Y 1 and Y are distributed according to (4), which leads to the density 

f(y;H,'E,T) = <l> p (y;n,'E) -Q . (24) 

This expression for an ESN distribution is due to Arnold et al. (1993), and the threshold r is 
known as an extension parameter. With this additional parameter, the normalizing constant 
is no longer a simple fixed value (such as 2 in the restricted case and TP in the unrestricted 
case), but a scalar value that depends on the extension parameter. Although the ESN is 
more complicated than the restricted and unrestricted skew normal distributions, it has nice 
properties not shared by these 'no-extension' cases, including closure under conditioning. 

The ESN distribution represents one of the simplest cases of the extended form. Replac- 
ing the latent variable Y with a g-dimensional version Y leads to the unified skew normal 
(SUN) distribution (Arellano- Valle and Azzalini, 2006). The SUN distribution is an attempt 
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to unify all of the aforementioned skew normal distributions. Its conditioning-type stochastic 
representation is given by (1) and (2). It follows that the SUN density is given by 



f(y, M, S, T, A, t) = p (y; fx, S) 



3> g (t + A T S~ 1 (t/ - m); 0, T - A r S- 1 A) 
$ 9 (r;0,r) 



(25) 



Their construction can also be achieved via the convolution approach, where the g-dimensional 
latent variable Y Q follows a truncated normal distribution with mean r. More specifically, let 
Y 1 ~ iVp(0, S) and Y ~ TN q (r,T) be independent variables, where TN q (T,T) denotes a 
multivariate normal variable with mean vector r and covariance T truncated to the positive 
hyperplane. Then V = + AY + SY"i has an extended MSN density. Note that in this 
case, the skewness parameter is a p x g matrix instead of the p-dimensional vector 6 used in 
the restricted form of the MSN distribution and the uMSN distribution. 

It is not difficult to show that the SUN distribution includes the restricted MSN distribu- 
tions, the unrestricted MSN distributions, and the ESN distribution as special cases. There are 
also various versions of MSN distributions which turns out to be equivalent to the SUN distri- 
bution, including the hierarchical skew normal (HSN) of Liseo and Loperfido (2003), the closed 
skew normal (CSN) of Gonzalez- Far as et al. (2004), the skew normal of Gupta et al. (2004) 
and a location-scale variant of the CFUSN distribution ( Arellano- Valle and Genton, 2005). For 
a detail discussion on the equivalence between these extended forms of MSN distributions, the 
reader is referred to Arellano- Valle and Azzalini (2006). 

2.1.4 Generalized multivariate skew normal distributions 

A further generalization of the extended form of the MSN distribution is to relax the distribu- 
tional assumption of the latent variable Y . For the 'generalized form' of the MSN distribution, 
there are no other restrictions on the MSN density except that the symmetric part must be 
a multivariate normal density, that is, Y\ is normally distributed. This form is very general 
and apparently includes the other three forms discussed above. A prominent example is the 
fundamental skew normal distribution (FUSN), a member of the class of fundamental skew 
distributions considered by Arellano- Valle and Genton (2005). Its density is given by 



where K q = E {Q q (Y)} is a normalizing constant and Q q {y) is a skewing function. Notice 
that the skewing function here is not restricted to the normal family. As mentioned previously, 
the FUSN density can be obtained by defining Y = (Y 1 \ Y > 0), where Y 1 follows the 
p-dimensional normal distribution with location parameter fi and scale matrix £ and Y is 
a q x 1 random vector. Under this definition, K q and Q q {y) is given by P(Y > 0) and 
P(Y > | Fi), respectively. 

An interesting special case of (26) is the location-scale variant of the so-called canoni- 
cal fundamental skew normal (CFUSN) distribution, obtained by taking Y ~ N q (0,I q ) 
and cov("Ki,l^o) = A. In this case, we have Y | Y ~ N q (A T T,^ 1 (y — /it), A), where 
A = I k - A T S _1 A. This leads to the density 



We shall write Y ~ CFUSN Ptq (fi, S, A). It should be noted that by taking q = p, A = diag(<5) 
and S = S* + AA T (where E* corresponds to the scale matrix S in (21)), (27) reduces to the 
unrestricted skew normal density introduced by Sahu et al. (2003). Also, the CFUSN density 
reduces to the restricted B-MSN distribution (7) when q = 1 and A = S. 



f(y; n, S, Q q ) = K q </> p (y;tJi,E)Q q (y), 



(26) 



A) = 2V P (y; ^ S)$ 9 (A T S- 1 (y - /x); 0, A). 



(27) 
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Case 


Restrictions on FUST 


Examples 


restricted 


y 

q = 1, r = and y ~ 


B-MST, A-MST, G-MST, P-MST, SNI-ST 


unrestricted 




S-MST 


extended 


Y t Q+P 


EST, CST, CFUST, SUT 


generalized 


Yi is t-distributed 


FST, GST 



Table 5: Classification of MST distributions. 



2.2 Multivariate skew ^-distributions 

The multivariate skew t-distribution is an important member of the family skew-elliptical distri- 
butions. Like the skew normal distributions, there exists various different versions of the MST 
distribution, which can be naively classified into four broad forms. The MST distribution is of 
special interest because it offers greater flexibility than the normal distribution by combining 
both skewness and kurtosis in its formulation, while retaining a fair degree of tractability in 
an algebraic sense. This additional flexibility is much needed in some practical applications, as 
will be demonstrated in the examples in Section 4. 

In the past two decades, many variants of the multivariate skew t-distribution have been 
proposed. Some notable proposals include the skew t-member of Branco and Dey (2001) 's skew 
elliptical class, the skew t-distribution of Azzalini and Capitanio (2003), the skew t-distribution 
of Gupta (2003), the skew t-distribution of Sahu et al. (2003) 's skew elliptical class, the skew 
normal/independent skew t (SNI-ST) distribution of Lachos et al. (2010), the closed skew t 
(CST) distribution of Iversen (2010), and the extended skew t (EST) distribution of Arellano- 
Valle and Genton (2010). Many of these can be considered as special cases of the fundamental 
skew t (FUST) distribution distribution introduced by Arellano- Valle and Genton (2005). They 
may be classified as 'restricted', 'unrestricted', 'extended', and 'generalized' subclasses of the 
FUST distribution (see Table 5). 



2.2.1 Restricted multivariate skew t-distributions 



The restricted skew t-distribution is obtained by conditioning on a univariate latent variable 
Yq being positive. The correlation between Y 1 and Y is described by the vector 6* . Like the 
MSN distributions, the MST distributions can be obtained via a conditioning and convolution 
mechanism. In general, the restricted MST distribution has a conditioning-type stochastic 
representation given by: 

(28) 





Y = v + (Y 


i 


Y > 0), 


. Y 1 




" " 





r T n 

1 S* 
5* £* _ 



where 



The equivalent convolution-type representation is given by 

Y = fx + ~d\Y \ + £Y 1: 



v 



(29) 



(30) 



where the two random variables Y ~ ti(0,l,v) and Y\ ~ t p (0, X, v) are independent. The 
link between the pairs of parameters (<5*,X*) and (<5, S) are the same as that for the rMSN 
distribution. The skew-t distribution of Branco and Dey (2001), Azzalini and Capitanio (2003), 
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Abbreviation 


Name 


References 


rMSN 


B-MST 


Branco's MST 


Branco and Dey (2001) 


A-MST 


Azzalini's MST 


Azzalini and Capitanio (2003) 


G-MST 


Gupta's MST 


Gupta et al. (2004) 


P-MST 


Pyne's MST 


Pyne et al. (2009) 


SNI-ST 


skew normal/independent MST 


Lachos et al. (2010) 


uMST 


S-MST 


Sahu's MST 


Sahu et al. (2003) 


eMST 


EST 


Extended MST 


Arellano- Valle and Genton (2010) 


CST 


Closed MST 


Iversen (2010) 


SUT 


Unified MST 


Arellano- Valle and Azzalini (2006) 


gMST 


FUST 


Fundamental MST 


Arellano- Valle and Genton (2005) 


GST 


Generalized MST 


Genton and Loperfido (2005) 


FST 


Flexible MST 


Arellano- Valle and Genton (2010) 



Table 6: Summary of the abbreviations of skew t-distributions used in Table 5. 



Gupta (2003), the SNI-ST, and the skew t version given by Pyne et al. (2009) are equivalent 
to (28) up to a reparametrization. 

The skew ^distribution of Branco and Dey (B-MST) 

The skew elliptical class of Branco and Dey (2001) includes a skew t-distribution, which is a 
special case of a scale mixture of the B-MSN distribution. Its density is given by 



f(y) = 2t p (y; fi, S, v) 

T\ (s T ^\y - /*); 0, (^ff) (1 " S T ^6) ,u + P ^, 



(31) 



where d(y) = (y — ii) T H~ l {y — /x) is the squared Mahalanobis distance between y and fi with 
respect to S. Here, we let t(.; fi, S, v) denote the p-dimensional t-density with location vector 
H, scale matrix S, and degrees of freedom u, and Ti(.; /i, a 2 , u) denote the distribution function 
of the (univariate) t-distribution with mean /i, variance a 2 and degrees of freedom v. It can 
be observed from representation (31) that the multivariate skew t-distribution converges to the 
B-MSN density (7) when the degrees of freedom v approaches infinity. 

It follows that Y has a conditioning-type representation given by Y = // + (Y x \ Y > 0), 
where 



Y 
Y 1 








1 8 T 
8 S 



V 



and a corresponding convolution-type representation given by 

Y = n + 5 Y +(I P -X-^6S T X-^Y 1 , 

where Yi ~ t p (0, S, v) and Y ~ t 1 (0, 1, v). It is apparent that (32) is identical to (29). 



(32) 



(33) 
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Name 


Density 


B-MST 
(2001) 


f(y) = 2t p (y; /x, E, u)T x ^Jr\y - M)^^§y 1 0, 1 - d^d, v + p) 
d(y) = (y-tifX- 1 (y-») 


A-MST 
(2003) 


f (y) = 2t p (y; /x, S, v) T x (f^D^ (y - /x) ^^jl 0, 1 - 8 T A R^8 A , v + p) 

D = diag( v / Sn, • • • , a/S pp ), 

R = D l HD 1 
d{y) = (y - ri T H-\y - fx) 


G-MST 
(2003) 


f(y) = 2t p (y; /x, E, u^S^y - n) y/^y 5 0, 1 - 8^8 G , v + p) 
d{y) = {y-ii) T ?:-\y-n) 


P-MST 
(2009) 


f(y) = 2t p (y; n, ft, v)T x (^fT 1 (y - fx) J 1 ^yy 0, 1 - PSl^S, v + p) 

ft = E* + 8*8 
d(y) = (y - tifn-'iy - 


SNI-ST 
(2010) 


/(y) = 2t p (y; n, E, z/)T x [8 T S ^ (y - /x) y/ ; 0, 1 - 6^5, ^ + p) 
d(y) = (w-/*) T S- 1 (y-/i) 



Table 7: Densities of selected restricted forms of multivariate skew t-distributions. 



Distribution 


Conditioning-type representation 


Convolution-type representation 


B-MST 
(2001) 




' Y ' 
_ Y 1 


y = /i 

~ ti+p | 


'[ " 
v.O. 


r >o) 
" 1 s T ' 

'[6 E 




) 


Y = n + 5\Y \ + Y 1 
Y ~ ^(0,1,1/) 
Fi ~ t p (0, E - 88 T , v) 


A-MST 
(2003) 




Y ' 
Y 1 


F = /x-f 

~ ti+p ^ 


" " 
J ' 


r >o) 


]■') 


Y = f ji + DS A \Y \+Y 1 
Y ~ ti(0,l,z/) 
Y - ! ~ f„(0, E - D8 A 8 T A D, u) 


G-MST 
(2003) 




" Y 
_ Y 1 


Y = fi 

~ h+p ^ 


1 
J ' 


Y o >0) 

1 <5£s " 
S(5 G E 


•0 


Y = l i + V8 G \Y Q \+Y 1 
Y ~ ti(0,l,i/) 
Y-i ~ * p (0, E - ES^E, 1/) 


P-MST 
(2009) 




' Y 
_ Y 1 


F = /x 

~ ti+p 


([:] 


r >o) 
" 1 5 T " 
' [ s ft 




) 


F = /x + <5|F |+F 1 
Y 1 ~ f p (0,E» 


SNI-ST 
(2010) 




Y ' 
Y 1 


Y = p 
~ *i+p I ( 


) 1 
) J ' 


Y >0) 
1 <5£E^ 

S^ 5 E 


■•) 


F = /x + E*<5 S |F | + (J p - Ms)**"i 
F ~ *i(0, 1, 1/) 
Fx ~ t p (0,E,i/) 



Table 8: Stochastic representations of selected restricted forms of multivariate skew t- 
distributions. 
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The skew ^distribution of Azzalini and Capitanio (A-MST) 

Azzalini and Capitanio (2003) extended the A-MSN distribution of Azzalini and Dalla Valle 
(1996) to the skew t-case. Its density is given by 

f(y) = 2t p (y; n, E, v) 

T X (slR-'D^ (»-/*); 0, {^f) (1 - S T A RS A ) ,v + p 

(34) 

where d(y) — (y — /^) T S _1 (y — //), R = £>~ 1 SZ>~ 1 is the correlation matrix, D is the 
diagonal matrix created by extracting the main diagonal elements of S. Note again that 
the parameter 8 in (34) is marked with a subscript A to indicate that it is different to 
the definition used in (31) and other rMST distributions. The A-MST density (34) can 
be obtained by a conditioning mechanism, similar to the A-MSN distribution, by setting 
Y = n + D(Y 1 | Y > 0), where 



(35) 







" " 




\ 1 S T A~ 




. Y l 





1 


8 A R 


■■) 



A parallel representation of (34) via the convolution mechanism is given by 



Y = /x + D8j 



Yo 



(36) 



where Y ~ ^(0, 1, v) and Y x ~ t p (0, S - D8 A 8 T A D, v) are independent. In this parameteriza- 
tion, the scale matrix S is partitioned into DRD, making the skewness parameter invariant to 
a change of scale. Setting 8 in (31) to D8a leads to the B-MST distribution (34). This char- 
acterization of the rMST distribution was considered by Fruhwirth-Schnatter and Pyne (2010) 
to define a skew t-mixture model, and an algorithm for parameter estimation was formulated 
using a Bayesian framework. 

The skew t-distribution of Gupta (G-MST) 

In Gupta (2003), another version of the restricted skew t-distribution is defined, starting from 
the A-MSN distribution of Azzalini and Dalla Valle (1996). In this parameterization, the scale 
matrix S is not factored into the product DRD, and the parameter 8a is replaced by D^HSg, 
leading to a density in a slightly simpler algebraic form, given by 

f(y) = 2t p (y; /x, S, v)T x fay - /i); 0, {^ff) (l " ^a) ,v + p) , (37) 

where, as before, d(y) — (y — /x) T S _1 (y — /x). Note that (37) is identical to (31) if we rewrite 
<5 in (31) as S _1 <5g. It follows that the stochastic representation of the G-MST distribution 
(37) can be expressed as 



ix + (Vi | Y > 0) , 



where 



Y 








1 

V8 G 



8 T G X 



V 



(38) 



(39) 
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The skew normal/independent skew t-distribution (SNI-ST) 

The skew t member of the SNI class, denoted by SNI-ST, is introduced as a scale mixture of 
SNI-SN distributions with gamma scale factor (Lachos et al., 2010). Its density is given by 



f(y) = 2t p (y; /x, S, v)T x (s T s ^ (y - fi) ; 0, (^f^) (l " «s*s) + , 



(40) 



where d(y) = (y — /x) T S l (y — fi), and is the square root matrix of £; that is, S^S^ = S. 
The SNI-ST distribution (40) can be generated by taking Y = fi + (Y 1 \ Y > 0), where 








1 



and the corresponding convolution-type representation is given by 

Y = fi + V*6 S \Y \ + (I p - d s 5 T s ) l 2Y 1 , 



(41) 



(42) 



where again Y ~ ti(0,l,u) and Yi ~ t p (0, S,z/) are independently distributed. It can ob- 
served that (40) is equivalent to (31) by replacing 8 in (31) with H^ds- Basso et al. (2010) and 
Cabral et al. (2012) studied, respectively, finite mixtures of univariate and multivariate SNI-ST 
distributions, and derived an ECME algorithm for computing the ML estimates of the model 
parameters. 

The skew t-distribution of Pyne et al. (P-MST) 

In Pyne et al. (2009), a restricted variant of Sahu et al. (2003) 's skew t-distribution was intro- 
duced, and its density is given by 

f(y) = 2t p (y; ft, SI, y)T x (VfT 1 (y - fi) ; 0, {^^f) (l " S^d) , v + p^j , (43) 

where d(y) = (y — ^) T i7 _1 (t/ — fi) and ft = £* + 66 T . We shall refer to the density (43) as the 
rMST distribution. This distribution has straightforward conditioning and convolution-type 
stochastic representations, given by 



and 

respectively, where 



Y = fi + (Y ! | Y > 0) , 
Y = n + S\Y \+Y u 



Y 
Y 1 



t 



i+P 








i s T 



(44) 



and Y \ ~ t p (0, S*, v) and Y ~ ti(0, 1, v). It can be observed that the restricted MST (43) is 
equivalent to (28), where Q is used in place of S. Mixtures of rMST distributions was first stud- 
ied by Pyne et al. (2009), and a closed-form implementation of the EM algorithm was outlined. 
Vrbik and McNicholas (2012) subsequently provided an alternative exact implementation. 

A summary of the correspondence between the parameters used in various versions of the 
restricted MST distribution is given in Table 9. Their densities and stochastic representations 
are listed in Table 7 and 8. 
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rMST 


S* 




B-MST 


s 


£ 


A-MST 


DS A 




G-MST 




£ 


P-MST 


8 


n 


SI-ST 




s 



Table 9: Correspondence between the parametrization of the restricted forms of MST distribu- 
tions. 



2.2.2 Unrestricted multivariate skew t-distributions 

In the unrestricted case, the latent variable Y is a p-dimensional random vector following a 
t-distribution. Under this setting, Y is given in terms of the conditional distribution of Y"i 
given Y is positive. The condition Y > implies that each element of l^o is greater than 
zero. Similar to (28), the unrestricted MST distribution takes the form 



Y = fx + (Y 1 | Y > 0) , 



where 



Y 
Y l 



t 



2p 








I P A 



A £ 

The analogous convolution-type representation is given by 

Y = l x + A\Y \+Y 1 , 



(45) 



(46) 



(47) 



where the two random vectors Y and Y\ are independently distributed as t p (0, I p , v) and 
t p (0, S, v), respectively. This form of the MST distribution is studied in detail in Sahu et al. 
(2003), and its density is given by 



f(y) = 2 p t p (y; fx, tt, u) T p A 1 n~\y - /i); 0, 



v + 



d(y) \ 



v + p J 



A, v + p 



where 

d(y) = 



diag(5), Q 



S + A 



(48) 



AO, 1 A, and 



(y — fx) CI (y — fx). ML estimation for the unrestricted characterization of the MST 



distribution is a difficult computational problem. Lin (2010) used a Monte Carlo (MC) E-step 
when implementing the EM algorithm. Later, Lee and McLachlan (2011), Ho et al. (2012), and 
Lee and McLachlan (2012) proposed an improved implementation using a truncated moments 
approach. 

It is important to point out that, although the rMST distribution (43) was originally ob- 
tained as a restricted variant of the uMST distribution (48), and both can be constructed by 
the conditioning and convolution approach, where (48) uses a p-dimensional latent variable 
instead of a scalar latent variable used in (43), the density (48) does not incorporate (43). The 
two densities are equivalent only in the univariate case. 



2.2.3 Extended multivariate skew t-distributions 

There are parallel versions of the ESN and the SUN distributions for the skew t-distribution, 
known as the extended skew t (EST) distribution (Arellano- Valle and Genton, 2010) and the 
unified skew t (SUT) distribution (Arellano- Valle and Azzalini, 2006), respectively. Their links 
are analogous to that of the skew normal distributions in Section 2.1.3. 
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Model 


Algorithm 


References 


rMSN 


FM-rMSN 
FM-SNI-SN 
FM-A-MSN 


traditional EM 
traditional EM 
Bayesian EM 


Pyne et al. (2009) 
Cabral et al. (2012) 
Friihwirth-Schnatter and Pyne (2010) 


uMSN 


FM-uMSN 


traditional EM 


Lin (2009) 



Table 10: EM algorithms for fitting restricted and unrestricted forms of multivariate skew 
normal mixture models. 

2.2.4 Generalized multivariate skew t-distributions 

Similar to the generalized forms of the MSN distribution, analogous extension to the skew t 
case can be constructed. This includes the FUST distribution and other subclasses of it, as 
well as the very general selection t-distribution (Arellano- Valle et al., 2006). 

3 Mixtures of multivariate skew normal and 
skew ^-distributions 

In the mixture model context, a population is assumed to be composed of a finite number of 
subpopulations. Let Y = Yi, . . . ,Y n denote a random sample of n observations. Then the 
probability density function (pdf) of the g component finite mixture model takes the form 

9 

/(!/;*) = X>/(l/;0ft)> ( 49 ) 

h=l 

where f(y; 6 h ) is the density of the hth. population, and n h its corresponding weight. The mixing 
proportions ir h satisfies n h > (h — 1, . . . , h), and Ylh=\ n h — 1- The vector h consists of the 
unknown parameters in the postulated form of the hth component of the mixture model, and 
^ = (iii, ■ ■ ■ , tt 9 _i, 6 q , . . . , 6 g ) T denotes the vector containing all unknown parameters. 

Computation of the ML estimates of the model parameters is typically achieved through the 
EM algorithm. Under the EM framework, the observed data vector is regarded as incomplete, 
and latent component labels (and possibly other latent variables as needed) are introduced. The 
unobservable component labels z h j are defined as binary indicator variables, which takes the 
value of one when observation y^ belongs to the hth component, and zero otherwise. The E-step 
computes the so-called Q-function, which is the conditional expectation of the log likelihood 
function given the observed data, using the current fit for ^ . In the M-step, parameters are 
updated by maximizing the Q-function obtained from the E-step. The algorithm then proceeds 
by alternating the E- and M-steps until its likelihood increases by an arbitrary small amount 
in the case of convergence of the sequence of likelihood values. 

3.1 Finite mixtures of multivariate skew normal distributions 

With reference to (14), the pdf of a ^-component finite mixture of restricted multivariate skew 
normal (FM-rMSN) distributions is given by 

9 

f(y,*) = Y,*hf{v,l* h ,'E h ,5 h ), (50) 
h=i 
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Model 


Algorithm 


References 


rMST 


FM-rMST 
FM-rMST 
FM-SNI-ST 
FM-A-MST 


EM with OSL 
traditional EM 

ECME 
Bayesian EM 


Pyne et al. (2009) 
Vrbik and McNicholas (2012) 
Cabral et al. (2012) 
Fruhwirth-Schnatter and Pyne (2010) 


uMST 


FM-uMST 
FM-uMST 
FM-uMST 


MCEM 
EM with OSL 
ECME 


Lin (2009) 
Lee and McLachlan (2011) 
Lee and McLachlan (2012) 



Table 11: EM algorithms for fitting restricted and unrestricted forms of multivariate skew 
t-mixture models. 

where f(y; ^i h) Y, h) 8 h ) refers to the rMSN density (14). At the (k + l)th iteration, the E-step 
requires the computation of the conditional expectations 

e Si = { U i I Vj> z hj = 1} , (51) 

= V)Wl»i-% = l}. (52) 

where Uj \ Zhj = 1 ~ HN(0, 1). Simple closed-form expressions for the E- and M-steps of the 
EM algorithm for fitting mixtures of restricted forms of MSN distributions can be obtained. 
Pyne et al. (2009), Cabral et al. (2012) and Fruhwirth-Schnatter and Pyne (2010) studied, 
respectively, finite mixtures of the rMSN, SNI-SN, and A-MSN distributions, the latter from a 
Bayesian perspective (see Table 10). The closed-form EM implementations for FM-rMSN and 
FM-SNI-SN are available publicly from the R packages EMMIX-skew (Wang, 2009) and mixsmsn 
(Prates et al., 2011). On closer examination of the EM algorithm provided by Pyne et al. (2009) 
and Cabral et al. (2012), it is not difficult to show that their expressions for the E- and M-steps 
are identical, after an appropriate change in the parameterization as described in Section 2.1.1 
For the unrestricted case, Lin (2009) provided an implementation of the EM algorithm for 
fitting the FM-uMSN model. The conditional expectations required at the E-step are equivalent 
to (51) and (52), except the latent variable Uj is replaced by a multivariate equivalent. Closed- 
form expressions were also achieved for the FM-uMSN model. This, however, inevitably results 
in higher computational cost. Whereas (51) and (52) can be written in terms of the (univariate) 
t-distribution function for the restricted case, the unrestricted case requires the computation 
of the multivariate equivalent. 

3.2 Finite mixtures of multivariate skew t-distribution 

The density of a finite mixture of restricted multivariate skew t (FM-rMST) distributions is 
given by 

9 

/(?/;*) = ^2^hf(y,fJ- h ,^h,S h ,u h ), (53) 
h=i 
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where f(y; fj, h , T, h , Sh, Vh) refers to the rMST density (43). The necessary conditional expecta- 
tions required on the E-step at the ik + l)th iteration are 




E*w {log(W^) | y^Zhj = 1} , 
E^k) {Wj | yj,z h j = 1} , 



Eyw {WjUj | yj,z hj = 1} , 
{WjUf | y^Zhj = 1} , 



(54) 
(55) 
(56) 
(57) 



where Uj \ Wj, z h j = 1 ~ HN(0, 1) and Wj \ Zhj = 1 ~ gamma (z/^/2, u h /2). Simple closed-form 
expressions for the E- and M-steps of the EM algorithm for fitting mixtures of restricted forms 
of MST distributions can be obtained. Pyne et al. (2009) (c.f. Wang et al. (2009)), Fruhwirth- 
Schnatter and Pyne (2010), Cabral et al. (2012), and Vrbik and McNicholas (2012) studied, 
respectively, finite mixtures of the rMST, A-MST, SNI-ST, and rMST distributions (see Table 



In Lee and McLachlan (2012), it is pointed out that the EM algorithms for fitting the 
FM-rMSN distribution (in particular, the expressions for (55)-(57)) obtained by Pyne et al. 
(2009) and Vrbik and McNicholas (2012) are equivalent. More specifically, the former uses 
expressions for the moments of a (univariate) truncated t-distribution to solve (56) and (57), 
and the latter expresses them in terms of hypergeometric functions. As in the case of the 
FM-rMSN and FM-SNI-SN distributions, the expressions (55)-(57) for the FM-SNI-ST model 
are identical to that for the FM-rMST model. The only difference between the two algorithm 
lies in the estimation of the degrees of freedom, where Pyne et al. (2009) and Wang et al. 
(2009) use a one-step-late (OSL) approach to compute the conditional expectation (54), while 
Cabral et al. (2012) employ an ECME algorithm. However, it should be noted that the ECME 
algorithm presented in Cabral et al. (2012) assumes the degrees of freedom to the same across all 
components, whereas such a restriction was not imposed when applying the algorithm provided 
by Pyne et al. (2009). Again, the implementations of the EM algorithm for fitting FM-rMST 
and FM-SNI-ST are available from the R packages EMMIX-skew and mixsmsn. 

In the case of the FM-uMST model, Lin (2009) and Lee and McLachlan (2011) have put 
forward two versions of an EM algorithm for fitting the unrestricted MST distribution. The 
former implemented a Monte Carlo (MC) E-step for calculating the conditional expectations 
similar to (54)- (57), but for the unrestricted case. The latter employed the OSL approach to 
calculate (54), and expressed (56) and (57) in terms of moments of the multivariate truncated 
t-distribution. Lee and McLachlan (2012) have demonstrated that the second approach have 
led to significant reduction in computation time and improvement in accuracy. They have also 
sketched an exact implementation of the EM algorithm for the FM-uMST model, which results 
in an ECME implementation similar to the algorithm provided by Cabral et al. (2012) for the 
restricted model. 

4 Clustering DLBCL samples 

To demonstrate the performance of the multivariate skew mixture models mentioned in Section 
3, we consider the clustering of a trivariate Diffuse Large B-cell Lymphoma (DLBCL) dataset 
provided by the British Columbia Cancer Agency. The data contain over 3000 cells derived 
from the lymph nodes of patients diagnosed with DLBCL. Each sample was stained with three 
markers, namely, CD3, CD5, and CD 19. The task is to cluster the cells into three groups. 
Hence, we fit a three-component FM-uMST model to the data. For comparison, we include 
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Model 


FM-uMST 


FM-rMST 


FM-MNIG 


FM-MSAL 


Misclassification rate 


0.0405 


0.0638 


0.1838 


0.2674 



Table 12: Misclassification rates for various multivariate mixture models on the DLBCL dataset. 
Cells identified as dead cells were not included in the calculation of error rate. 

the results of two non-elliptically contoured mixture models, finite mixture of multivariate 
normal-inverse-Gaussian (FM-MNIG) distributions and finite mixture of multivariate shifted 
asymmetric Laplace (FM-MSAL) distributions. 

The MNIG distribution is a flexible parametric family with four parameters (Karlis and 
Santourian, 2009). Like the skew t-distribution, the MNIG distribution can accommodate 
skewness and heavy tails in the data. Computation of the ML estimates of the parameters 
of the model is carried out by the EM algorithm, with closed-form E- and M-steps involving 
modified Bessel functions. The MSAL distribution is another alternative to the skew normal and 
skew t-distribution. As a three-parameters distribution, the MSAL distribution has parameters 
that controls its location, scale and skewness. The EM algorithm for fitting mixtures of MSAL 
distributions is computationally straightforward compared to that for the FM-MNIG model 
and skew mixture distributions (Franczak et al., 2012). 

A scatterplot of the data is shown in Figure 1(a), where the dots are coloured according 
to the clustering provided by human experts, which is taken as the 'true' cluster labels. Fig- 
ure l(b)-(e) shows the density contours of the components of the fitted FM-uMST, FM-rMST, 
FM-MNIG, and FM-MSAL models respectively, which are displayed with matching colours to 
Figure 1(a). To assess the performance of these algorithms, we calculated the rate of mis- 
classification against the 'true' results, given by choosing among the possible permutations of 
the cluster labels the one that gives the lowest value. A lower misclassification or error rate 
indicates a closer match between the true labels and the cluster labels given by the candidate 
algorithm. Note that dead cells were removed before evaluating the misclassification rate. From 
Table 12, the multivariate skew t- mixture models clearly outperforms the other methods in this 
dataset. This is also evident in Figure 1, where the component contours of the FM-uMST and 
FM-rMST models resembles quite well the shape of the clusters identified by manual gating. 
The results from Table 12 reveals that the unrestricted model is slightly more accurate than 
the restricted variant. Both FM-MNIG and FM-MSAL has disappointing performances, with 
the FM-MNIG model failing to separate between the middle (green) and lower (red) clusters, 
while the FM-MSAL model have difficulty in separating all three clusters. 

5 Concluding Remarks 

We have presented a schematic way to classify multivariate skew distributions into four types, 
namely, the 'restricted', 'unrestricted', 'extended' and 'generalized' forms. Concerning the 
use of the terminology 'restricted' and 'unrestricted', it should be noted that the restricted 
skew forms are not nested within the corresponding unrestricted forms, with these two forms 
coinciding only in the univariate case. However, these two forms are both special cases of the 
extended form, which itself is a special case of the generalized form. 

Current work on finite mixture of skew distributions have investigated only the restricted 
and unrestricted forms of multivariate skew distributions. Mixtures based on skew distributions 
of more general forms would be of interest. 
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aiL« in> 



[d) FM-MNIG 



FULO& 106 

(e) FM-M3AL 



Figure 1: DLBCL dataset: Automated gating results of DLBCL sample using five different finite 
mixture models. The population of 3290 cells were stained with three fluorescence reagents - 
CD3 (FL1.LOG), CD5 (FL2.LOG), CDf9 (FL4.LOG). (a) manual expert clustering of the 
DLBCL into three groups; (b) the fitted component contours of the three-component FM- 
uMST model; (c) the contours of the component densities of the fitted restricted (FM-rMST) 
model using (emmix); (d) the contour plot of the fitted the FM-MNIG model; (e) the fitted 
component contours of FM-MSAL model. 
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